Variance Reduced Stochastic Gradient Descent with Sufficient Decrease

نویسندگان

  • Fanhua Shang
  • Yuanyuan Liu
  • James Cheng
  • Kelvin Kai Wing Ng
  • Yuichi Yoshida
چکیده

In this paper, we propose a novel sufficient decrease technique for variance reduced stochastic gradient descent methods such as SAG, SVRG and SAGA. In order to make sufficient decrease for stochastic optimization, we design a new sufficient decrease criterion, which yields sufficient decrease versions of variance reduction algorithms such as SVRG-SD and SAGA-SD as a byproduct. We introduce a coefficient to scale current iterate and satisfy the sufficient decrease property, which takes the decisions to shrink, expand or move in the opposite direction, and then give two specific update rules of the coefficient for Lasso and ridge regression. Moreover, we analyze the convergence properties of our algorithms for strongly convex problems, which show that both of our algorithms attain linear convergence rates. We also provide the convergence guarantees of our algorithms for non-strongly convex problems. Our experimental results further verify that our algorithms achieve significantly better performance than their counterparts.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Guaranteed Sufficient Decrease for Stochastic Variance Reduced Gradient Optimization

In this paper, we propose a novel sufficient decrease technique for stochastic variance reduced gradient descent methods such as SVRG and SAGA. In order to make sufficient decrease for stochastic optimization, we design a new sufficient decrease criterion, which yields sufficient decrease versions of stochastic variance reduction algorithms such as SVRG-SD and SAGA-SD as a byproduct. We introdu...

متن کامل

Accelerating Stochastic Gradient Descent using Predictive Variance Reduction

Stochastic gradient descent is popular for large scale optimization but has slow convergence asymptotically due to the inherent variance. To remedy this problem, we introduce an explicit variance reduction method for stochastic gradient descent which we call stochastic variance reduced gradient (SVRG). For smooth and strongly convex functions, we prove that this method enjoys the same fast conv...

متن کامل

Asynchronous Stochastic Gradient Descent with Variance Reduction for Non-Convex Optimization

We provide the first theoretical analysis on the convergence rate of the asynchronous stochastic variance reduced gradient (SVRG) descent algorithm on nonconvex optimization. Recent studies have shown that the asynchronous stochastic gradient descent (SGD) based algorithms with variance reduction converge with a linear convergent rate on convex problems. However, there is no work to analyze asy...

متن کامل

Extensions of the Hestenes-Stiefel and Polak-Ribiere-Polyak conjugate gradient methods with sufficient descent property

Using search directions of a recent class of three--term conjugate gradient methods, modified versions of the Hestenes-Stiefel and Polak-Ribiere-Polyak methods are proposed which satisfy the sufficient descent condition. The methods are shown to be globally convergent when the line search fulfills the (strong) Wolfe conditions. Numerical experiments are done on a set of CUTEr unconstrained opti...

متن کامل

Coupling Adaptive Batch Sizes with Learning Rates

Mini-batch stochastic gradient descent and variants thereof have become standard for large-scale empirical risk minimization like the training of neural networks. These methods are usually used with a constant batch size chosen by simple empirical inspection. The batch size significantly influences the behavior of the stochastic optimization algorithm, though, since it determines the variance o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1703.06807  شماره 

صفحات  -

تاریخ انتشار 2017